As in the presentation, we will use data from the Public Use File (PUF) of the GESIS Panel Special Survey on the Coronavirus SARS-CoV-2 Outbreak in Germany for this exercise. You should (have) download(ed) the dataset in .csv format in a folder caller data within the folder containing the materials for this workshop. Also remember that it is helpful to consult the codebook for the dataset.
That being sad, let’s get wrangling…
…but before we can do that, we need to load the tidyverse package(s) and import the data.
library(tidyverse)
gesis_panel_corona <- read_csv2("../data/ZA5667_v1-1-0.csv")
We see here that most of the names are not very descriptive which is something that we might want to change.
hzcy053a to employment_march and hzcy071a to children using base R and then rename hzcy044a to trust_doctor and hzcy050a to trust_moh using a function from the tidyverse package dplyr.
base R function for this is colnames(), and the dplyr function is rename().
For the remainder of this exercise, we will focus on functions from the tidyverse. Of course, if you want to, you can also use base R to solve the tasks, or, if you are extra ambitious, you can use both.
demo: sex, age_cat, education_cat, marstat, household, children.
select() function from dplyr.
demo dataset, so that it only contains married men.
naniar package, recode the values -99, -77, -88, -33, -22, and -11 as missing for all variables in the demo dataframe.
NA in all variables.
demo dataframe? Do not assign the result to an object.
tidyr package to check this. Do not overwrite the demo object.
age_cat variable into an ordered factor with 5 levels called age_5_cat.
between() will be helpful here.